treeKL: A distance between high dimension empirical distributions

نویسندگان

  • Riwal Lefort
  • François Fleuret
چکیده

This paper offers a methodological contribution for computing the distance between two empirical distributions in an Euclidean space of very large dimension. We propose to use decision trees instead of relying on standard quantifi10 cation of the feature space. Our contribution is two-fold: We first define a new distance between empirical distributions, based on the Kullback-Leibler (KL) divergence between the distributions over the leaves of decision trees built for the two empirical distributions. Then, we propose a new procedure to build these unsupervised trees efficiently. 15 The performance of this new metric is illustrated on image clustering and neuron classification. Results show that the tree-based method outperforms standard methods based on standard bag-of-features procedures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Mahalanobis-distance based penalized empirical likelihood method in high dimensions

In this paper, we consider the penalized empirical likelihood (PEL) method of Bartolucci (2007) for inference on the population mean which is a modification of the standard empirical likelihood and employs a penalty based on the Mahalanobis-distance. We derive the asymptotic distributions of the PEL ratio statistic when the dimension of the observations increases with the sample size. Finite sa...

متن کامل

Empirical investigation of tourists' perceived psychic distance of Iran as a tourism destination

The aim of the current study was to investigate the perceived psychic distance of potentialtourists in relation to Iran as a tourism destination. The concept of psychic distance refersto perceived similarities/ differences between specific destination and tourist's homecountry. The members of couch-surfing virtual community participated in this study. Thestatistical data were collected by conve...

متن کامل

Testing for Equal Distributions in High Dimension

We propose a new nonparametric test for equality of two or more multivariate distributions based on Euclidean distance between sample elements. Several consistent tests for comparing multivariate distributions can be developed from the underlying theoretical results. The test procedure for the multisample problem is developed and applied for testing the composite hypothesis of equal distributio...

متن کامل

Wasserstein Distance Measure Machines

This paper presents a distance-based discriminative framework for learning with probability distributions. Instead of using kernel mean embeddings or generalized radial basis kernels, we introduce embeddings based on dissimilarity of distributions to some reference distributions denoted as templates. Our framework extends the theory of similarity of Balcan et al. (2008) to the population distri...

متن کامل

Relations between Renyi Distance and Fisher Information

In this paper, we first show that Renyi distance between any member of a parametric family and its perturbations, is proportional to its Fisher information. We, then, prove some relations between the Renyi distance of two distributions and the Fisher information of their exponentially twisted family of densities. Finally, we show that the partial ordering of families induced by Renyi dis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 34  شماره 

صفحات  -

تاریخ انتشار 2013